Data Scheduling on Processor-In-Memory Arrays Based on Data Placement and Data Movement

نویسندگان

  • Yi Tian
  • Edwin H.-M. Sha
  • Chantana Chantrapornchai
  • Peter M. Kogge
چکیده

In the study of PetaFlop project, Processor-In-Memory array was proposed to be a target architecture in achieving 1015 oating point operations per second computing performance. However, one of the major obstacles to achieve the fast computing was interprocessor communications, which lengthen the total execution time of an application. A good data scheduling, consisting of nding initial data placement and data movement during the run-time, can give a signi cant reduction in the total communication cost and the execution time of the application. In this paper, we propose e cient algorithms for the data scheduling problem. Experimental results show the e ectiveness of the proposed approaches. Compared with default data distribution methods such as row-wise or column-wise distributions, the average improvement for the tested benchmarks can be up to 30%. This work was partially supported by NSF MIP 95-01006 and NSF ACS 96-12028. 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Data Scheduling on Processor-in-Memory Arrays

In the study of PetaFlop project, Processor-In-Memory array was proposed to be a target architecture in achieving 10 floating point operations per second computing performance. However, one of the major obstacles to achieve the fast computing was interprocessor communications, which lengthen the total execution time of an application. A good data scheduling, consisting of finding initial data p...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A Framework for Memory-aw Application Mapping on Chip

The relentless increase in multimedia embedded system application requirements as well as improvements in IC design technology have motivated the deployment of chip multiprocessor (CMP) architectures. Task scheduling and data placement in memory are two of the most important steps in the application customization process as they greatly influence overall power consumption, and performance. Most...

متن کامل

A Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints

One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997